Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Enhance Memory Management with Lock-Free Allocator, Preallocation, and Optimized Thread-Local Caching #2825

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

beats-dh
Copy link
Collaborator

Detailed Description:

1. Introduction of Static Preallocation:

  • Implementation of preallocate Method: Added the ability to preallocate a fixed number of memory blocks (STATIC_PREALLOCATION_SIZE = 100) during the initialization of LockfreePoolingAllocator. This reduces the need for dynamic allocation during runtime and improves system efficiency.
  • Use of std::call_once: Ensures that preallocation is performed only once, safely, and at runtime using std::call_once and std::once_flag.

2. Optimization of Thread-Local Cache:

  • New Local Cache Limit (LOCAL_CACHE_LIMIT): Introduced a dynamic calculation to adjust the size of the thread-local cache based on the number of threads available in the system (TOTAL_THREADS). The local cache is adjusted to ensure that each thread has an adequate cache size, preventing excessive memory usage in environments with many threads. The default value is std::max(35 / TOTAL_THREADS, 5).
  • Thread-Local Storage (thread_local): The local cache is stored in a thread_local variable, ensuring that each thread maintains its own separate cache, improving efficiency by avoiding contention.

3. Improvements in the allocate Function:

  • Simplification of Control Flow: The code has been simplified by removing empty blocks and better organizing allocation checks. If the local cache is empty, the function attempts to retrieve blocks from the lock-free shared list; otherwise, a new memory block is dynamically allocated.

4. Improvements in the deallocate Function:

  • Enhanced Memory Management: The deallocation function has been optimized to efficiently reuse memory blocks, prioritizing the local cache before returning blocks to the lock-free shared list.

Reason for Replacing std::make_shared with This New Implementation:

1. Precise Control of Allocation and Deallocation:

The primary reason for replacing std::make_shared with the new implementation using LockfreePoolingAllocator is the need for finer and more efficient control over memory allocation and deallocation. std::make_shared combines object and reference counter allocation into a single operation, which is efficient in terms of memory usage, but does not offer the flexibility required for a system that demands specific optimizations like thread-local caching and memory block preallocation.

2. Optimization for Multithreaded Environments:

In a multithreaded environment, the new implementation allows each thread to maintain its own local memory block cache, reducing contention when accessing shared resources. std::make_shared, on the other hand, does not provide mechanisms to leverage these optimizations, making it less efficient in scenarios where frequent object allocation and deallocation occur.

3. Reusing Memory Blocks:

With the new implementation, memory blocks can be reused from both a thread-local cache and a lock-free shared list, depending on availability. This not only improves performance by minimizing dynamic allocations, but also offers more predictable and efficient memory management, especially under high load.

4. Reduction of Memory Fragmentation:

By using the new allocation strategy, there is a significant reduction in memory fragmentation. This is due to the ability to preallocate and efficiently reuse memory blocks, something that std::make_shared does not allow in a granular manner.

5. Flexibility and Extensibility:

The new approach also offers greater flexibility for future optimizations and adjustments according to the application's needs. The LockfreePoolingAllocator implementation allows customizations such as the amount of preallocated memory, the size of the local cache, and the behavior of the lock-free list, aspects that cannot be easily managed with std::make_shared.

In summary, switching to this new implementation provides greater control and efficiency in memory management, improving application performance in environments where scalability and high performance are crucial.

This comment was marked as outdated.

@jhogberg jhogberg mentioned this pull request Sep 12, 2024
3 tasks
Copy link
Contributor

This PR is stale because it has been open 45 days with no activity.

@github-actions github-actions bot added Stale No activity and removed Stale No activity labels Oct 19, 2024
Copy link

Copy link
Contributor

This PR is stale because it has been open 45 days with no activity.

@github-actions github-actions bot added the Stale No activity label Nov 30, 2024
@github-actions github-actions bot removed the Stale No activity label Dec 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants